73 research outputs found

    Collaborative editing of knowledge resources for cross-lingual text mining

    Get PDF
    The need to smoothly deal with textual documents expressed in different languages is increasingly becoming a relevant issue in modern text mining environments. Recently the research on this field has been considerably fostered by the necessity for Web users to easily search and browse the growing amount of heterogeneous multilingual contents available on-line as well as by the related spread of the Semantic Web. A common approach to cross-lingual text mining relies on the exploitation of sets of properly structured multilingual knowledge resources. The involvement of huge communities of users spread over different locations represents a valuable aid to create, enrich, and refine these knowledge resources. Collaborative editing Web environments are usually exploited to this purpose. This thesis analyzes the features of several knowledge editing tools, both semantic wikis and ontology editors, and discusses the main challenges related to the design and development of this kind of tools. Subsequently, it presents the design, implementation, and evaluation of the Wikyoto Knowledge Editor, called also Wikyoto. Wikyoto is the collaborative editing Web environment that enables Web users lacking any knowledge engineering background to edit the multilingual network of knowledge resources exploited by KYOTO, a cross-lingual text mining system developed in the context of the KYOTO European Project. To experiment real benefits from social editing of knowledge resources, it is important to provide common Web users with simplified and intuitive interfaces and interaction patterns. Users need to be motivated and properly driven so as to supply information useful for cross-lingual text mining. In addition, the management and coordination of their concurrent editing actions involve relevant technical issues. In the design of Wikyoto, all these requirements have been considered together with the structure and the set of knowledge resources exploited by KYOTO. Wikyoto aims at enabling common Web users to formalize cross-lingual knowledge by exploiting simplified language-driven interactions. At the same time, Wikyoto generates the set of complex knowledge structures needed by computers to mine information from textual contents. The learning curve of Wikyoto has been kept as shallow as possible by hiding the complexity of the knowledge structures to the users. This goal has been pursued by both enhancing the simplicity and interactivity of knowledge editing patterns and by using natural language interviews to carry out the most complex knowledge editing tasks. In this context, TMEKO, a methodology useful to support users to easily formalize cross-lingual information by natural language interviews has been defined. The collaborative creation of knowledge resources has been evaluated in Wikyoto

    ¿Es satírico este tweet? Un método automático para la identificación del lenguaje satírico en español

    Get PDF
    Computational approaches to analyze figurative language are attracting a growing interest in Computational Linguistics. In this paper, we study the characterization of Twitter messages in Spanish that advertise satirical news. We present and evaluate a system able to classify tweets as satirical or not. To this purpose, we concentrate on the tweets published by several satirical and non-satirical Twitter accounts. We model the text of each tweet by a set of linguistically motivated features that aim at capturing the style more than the content of the message. Our experiments demonstrate that our model outperforms a word-based baseline. We also demonstrate that our system models global features of satirical language by showing that it is able to detect if a tweet contains or not satirical contents independently from the account that generated the tweet.La lingüística computacional está cada vez más interesada en el procesamiento del lenguaje figurado. En este artículo estudiamos la detección de noticias satíricas en español y más específicamente la detección de sátira en mensajes de Twitter. Nuestro modelo computacional se basa en la representación de cada mensaje con un conjunto de rasgos diseñados para detectar el estilo satírico y no el contenido. Nuestros experimentos muestran que nuestro modelo siempre funciona mejor que un modelo de bolsa de palabras. También mostramos que el sistema es capaz de detectar este tipo de lenguaje independientemente de la cuenta de Twitter que lo origina.The research described in this paper is partially funded by the SKATER-UPF-TALN project (TIN2012-38584-C06-03)

    ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies

    Get PDF
    We introduce EXTASEM!, a novel approach for the automatic learning of lexical taxonomies from domain terminologies. First, we exploit a very large semantic network to collect thousands of in-domain textual definitions. Second, we extract (hyponym, hypernym) pairs from each definition with a CRF-based algorithm trained on manuallyvalidated data. Finally, we introduce a graph induction procedure which constructs a full-fledged taxonomy where each edge is weighted according to its domain pertinence. EXTASEM! achieves state-of-the-art results in the following taxonomy evaluation experiments: (1) Hypernym discovery, (2) Reconstructing gold standard taxonomies, and (3) Taxonomy quality according to structural measures. We release weighted taxonomies for six domains for the use and scrutiny of the communit

    Resumen y extracción de información en tu Tablet

    Get PDF
    In this article we present a Web-based demonstration of on-line text summarization and information extraction technology. News summarization in Spanish has been implemented in a system that monitors a news provider and summarizes the latest published news. The possibility to generate summaries from user's provided text is also available for English and Spanish. The demonstrator also features event extraction functionalities since it identifies the relevant concepts that characterize several types of events by mining English textual contents.En este artículo describimos la demonstración de una serie de aplicaciones de resumen automático y extracción de informaciones integradas en una tableta. Se presentan funcionalidades para resumir las últimas noticias publicadas en la Web, extraer información sobre eventos concretos, y resumir textos en inglés y español ingresados por el usuario. La aplicación está disponible en un Web-browser y una tableta con sistema operativo Android.We acknowledge support from the Spanish research project SKATER-UPF-TALN TIN2012-38584-C06-03, the EU project Dr. Inventor FP7-ICT-2013.8.1 611383, and UPF projects PlaQUID 65 2013-2014 and PlaQUID 47 2011-2012

    Semantic Web gets into social tagging

    Get PDF
    During the last few years, the need to effectively and efficiently manage this enormous interconnected quantity of data over the Web has lead to the introduction of new technological solutions which has mainly affected three distinct but tightly-coupled areas: - The integration and interoperability of the globally available informative content; - The introduction of new powerful methods to interact with and compose information; - The users' involvement in the process of content production and their gathering around communities of interest. These fields of research have respectively given rise to three new specific trends in Web information management: - The Semantic Web; - The Web 2.0; - The Social Web. In the first part of this document, we analyze each one of these Web approaches describing the supporting motivations of their introduction and their fundamental features. In particular, we explore in more details the Semantic Web because of the completely new point of view that it introduces in the representation of Web information, thus affecting all other Web aspects. We point out its main application fields, giving also some practical example and discussing its future perspectives. Considering Semantic Web, Web 2.0 and Social Web we try to explore their common or complementary aspects attempting to stress their possible synergies. Then we consider also another relevant related question: the diffusion of lexical resources overt the Web, focusing our attention on WordNet; lexical resources are increasingly representing an important reference mean to support and better structure the organization of Web content. In the second part of this document, we introduce a new possibility to mange and organize information about Web resources: the semantic tagging. Starting from the recent growing diffusion of the tagging activity as a powerful way to socially collect different kinds of descriptive information about Web resources through the association of one or more words called tags, we analyze the possibility to provide semantic support to this task. We identify and examine all the fundamental weakness' points of actual tagging systems, proving that the introduction of semantics in collaborative tagging can solve or at least reduce a considerable part of them. Indeed semantically tagged Web resources provide a new stronger organizational structure of the information collected through the metadata thus produced. We consider the current obstacles to the availability of an extensible and coherent semantic support to the semantic tagging activity, identifying some way of exploitation of existing Web resources in order to perform this task. We have developed SemKey: it is a semantic tagging system which practically gives the opportunity to exploit the main advantages due to the introduction of semantics in the tagging process. After providing the description of the architecture and the implementation of our collaborative semantic tagging system, we evaluate its advantages over actual tagging systems in terms of information structuring and retrieval and we point out some relevant suggestions for future works

    Http Request Scheduler

    Get PDF
    In the last years, the Web has reshaped around the concept of offer and ask for web services, creating a big distributed system of which these services are the main bricks. This study has the purpose to put in the world of web services a particular component, an HTTP request scheduling service, which can be instructed through a SOAP or REST interface to query or control other services through HTTP at an established time. Cron and other desktop schedulers\u27 power can be exploited to offer new kind of services, leading to new possibilities: to save histories with the first pages of a newspaper, to perform resource-tracking activities, to save different frames taken periodically by a site that manages a webcam, to activate and deactivate other services at a given time. We outlined the architecture of an HTTP Request Scheduler (HRS), implemented a working prototype in Java using the Quartz scheduling framework, and also defined a specific XML language to instruct the component

    Formalizing Knowledge by Ontologies: OWL and KIF

    Get PDF
    During the last years, the activities of knowledge formalization and sharing useful to allow for semantically enabled management of information have been attracting growing attention, expecially in distributed environments like the Web. In this report, after a general introduction about the basis of knowledge abstraction and its formalization through ontologies, we briefly present a list of relevant formal languages used to represent knowledge: CycL, FLogic, LOOM, KIF, Ontolingua, RDF(S) and OWL. Then we focus our attention on the Web Ontology Language (OWL) and the Knowledge Interchange Format (KIF). OWL is the main language used to describe and share ontologies over the Web: there are three OWL sublanguages with a growing degree of expressiveness. We describe its structure as well as the way it is used in order to reasons over asserted knowledge. Moreover we briefly present three relevant OWL ontology editors: Prot?eg?e, SWOOP and Ontotrack and two important OWL reasoners: Pellet and FACT++. KIF is mainly a standard to describe knowledge among different computer systems so as to facilitate its exchange. We describe the main elements of KIF syntax; we also consider Sigma, an environment for creating, testing, modifying, and performing inference with KIF ontologies. We comment some meaningful example of both OWL and KIF ontologies and, in conclusion, we compare their main expresive features

    Stimulating and Simulating Creativity with Dr Inventor

    Get PDF
    Dr Inventor is a system that is at once, a computational model of creative thinking and also a tool to ignite the creativity process among its users. Dr Inventor uncovers creative bisociations between semi-structured documents like academic papers, patent applications and psychology materials, by adopting a “big data” perspective to discover creative comparisons. The Dr Inventor system is described focusing on the transformation of this textual information into the graph-structure required by the creative cognitive model. Results are described using data from both psychological test materials and published research papers. The operation of Dr Inventor for both focused creativity and open ended creativity is also outlined

    Stimulating and Simulating Creativity with Dr Inventor

    Get PDF
    Dr Inventor is a system that is at once, a computational model of creative thinking and also a tool to ignite the creativity process among its users. Dr Inventor uncovers creative bisociations between semi-structured documents like academic papers, patent applications and psychology materials, by adopting a “big data” perspective to discover creative comparisons. The Dr Inventor system is described focusing on the transformation of this textual information into the graph-structure required by the creative cognitive model. Results are described using data from both psychological test materials and published research papers. The operation of Dr Inventor for both focused creativity and open ended creativity is also outlined
    corecore